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FOREWORD 


New  methods  for  training  computational  neural  networks  for  dynamic  system 
identification  and  control  have  been  created,  performance  of  the  training  algorithms  has 
been  analyzed,  and  the  resulting  neural  networks  have  been  evaluated.  Computational 
neural  networks  are  shown  to  have  excellent  potential  for  identifying  the  dynamic  models 
of  nonlinear  systems  and  for  controlling  such  systems  over  their  entire  operating  space. 
Three  topics  were  addressed: 

•  Aerodynamic  model  identification  using  sigmoid  and  radial-basis-function  networks 

•  Control  of  the  preferential  oxidizer  for  a  fuel-cell  power  system  using  a  neural  net¬ 
work 

•  Initializing  a  neural  network  (nonlinear)  controller  so  that  it  replicates  the  character¬ 
istics  of  a  gain-scheduled  linear  controller. 

This  research  produced  new  training  approaches  that  will  allow  future  dynamic  systems 
to  work  with  higher  accuracy,  greater  efficiency,  and  improved  reliability. 
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l.  Statement  or  the  problem 


Computational  neural  networks  are  motivated  by  input-output  and  learning  properties 
of  biological  neural  systems,  though  in  mathematical  application  the  network  becomes  an 
abstraction  that  may  bear  little  resemblance  to  its  biological  model.  Computational  neu¬ 
ral  networks  consist  of  nodes  that  simulate  the  neurons  and  weighting  factors  that  simu¬ 
late  the  synapses  of  a  living  nervous  system.  The  nodes  are  nonlinear  basis  functions, 
and  the  weights  contain  knowledge  of  the  system.  Neural  networks  are  good  candidates 
for  performing  a  variety  of  functions  in  intelligent  control  systems  because  they  are 
potentially  very  fast  (in  parallel  hardware  implementation),  they  are  intrinsically  nonlin¬ 
ear,  they  can  address  problems  of  high  dimension,  and  they  can  learn  from  experience. 
From  the  biological  analogy,  the  neurons  are  modeled  as  switching  functions  that  take 
just  two  discrete  values;  however,  "switching"  may  be  softened  to  "saturation,"  not  only 
to  facilitate  learning  of  the  synaptic  weights  but  to  admit  the  modeling  of  continuous,  dif¬ 
ferentiable  functions.  Furthermore  other  nonlinear  functions,  such  as  radial  basis  func¬ 
tions  and  wavelets,  can  be  used  as  activation  functions. 

The  neural  networks  receiving  most  current  attention  are  static  expressions  that  per¬ 
form  one  of  two  functions.  The  first  is  to  approximate  multivariate  functions  of  the  form, 

y  =  h(x)  (1 

where  x  andy  are  input  and  output  vectors  and  h(»)  is  the  (possibly  unknown)  relation¬ 
ship  between  them.  Neural  networks  can  be  viewed  as  generalized  spline  functions  that 
identify  efficient  input-output  mappings  from  observations.  The  second  application  is  to 
classify  attributes,  much  like  decision  trees. 

An  N-layer  feed-forward  neural  network  (FNN)  represents  the  function  by  a  sequence 
of  operations. 


r(k)  =  s(k)  [W(k- 1  )r(k- 1 )]  i  s(k)[-q(k)] ,  k  =  1  to  N  (2 

where  y  =  r(N)  and  x  =  r(°).  W(k_l)  is  a  matrix  of  weighting  factors  determined  by  the 
learning  process,  and  s(k)[»]  is  an  activation-function  vector  whose  elements  normally  are 
identical,  scalar,  nonlinear  functions  Ci(f|i)  appearing  at  each  node: 

s(k)[t|(k)]  =  [ci(T|  i(k))  ...GnCnn(k))]T  (3 

One  of  the  inputs  to  each  layer  may  be  a  unity  threshold  element  that  adjusts  the  bias  of 
the  layer's  output.  Networks  consisting  solely  of  linear  activation  functions  are  of  little 
interest,  as  they  merely  perform  a  linear  transformation  H,  thus  limiting  eq.  1  to  the  form, 
y  =  Hx.  Figure  1  represents  two  simple  feed-forward  neural  networks.  Each  circle  repre¬ 
sents  an  arbitrary,  scalar,  nonlinear  function  <?;(♦)  operating  on  the  sum  of  its  inputs,  and 
each  arrow  transmits  a  signal  from  the  previous  node,  multiplied  by  a  weighting  factor. 

More  than  one  set  of  weights  could  produce  the  same  functional  relationship  between 
x  andy.  Training  sessions  starting  at  different  points  could  produce  different  sets  of 
weights  that  yield  identical  outputs.  The  result  presages  a  well-known  problem  of  net¬ 
work  weight  determination:  multiple  local  minima  in  error-minimizing  solutions  that  may 
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prevent  the  identification  of  the  best  network  for  representing  a  given  function.  The  un¬ 
structured  feed-forward  network  may  not  have  compact  support  (i.e.,  its  weights  may 
have  global  effects)  if  its  basis  functions  do  not  vanish  for  large  magnitudes  of  their 
arguments. 


a)  Single-Input/Single-Output  Network,  b)  Double-Input/Single-Output  Network. 


Figure  1.  Two  Feed-forward  Neural  Networks. 

The  sigmoid  is  commonly  used  as  the  artificial  neuron.  It  is  a  saturating  function 
defined  variously  as  =  1/(1  +  e-h)  for  output  in  (0,1)  or  a(T|)  =  (1  -e'2tl)/(l  +  e'2ri)  = 

tanh  T|  for  output  in  (-1,1).  Recent  results  indicate  that  any  continuous  mapping  can  be 
approximated  arbitrarily  closely  with  sigmoidal  networks  containing  a  single  hidden  layer 

(N  =  2).  Symmetric  activation  functions  like  the  Gaussian  radial  basis  function  (G(r|)  = 

e_r  Wr)  have  better  convergence  properties  for  many  functions  and  have  more  compact 
support  as  a  consequence  of  near-orthogonality.  In  such  case,  eq.  2  is  rewritten  as 

r(k)  =  S(k)[r(k-1)T  w(k-Dr(k-l)]  =  s(k)[q(k)] ,  k  =  1  to  N  (4 

In  control  application,  neural  networks  perform  functions  analogous  to  gain  schedul¬ 
ing  or  nonlinear  control.  Consider  the  simple  two-input  network  of  Fig.  lb.  The  scalar 
output  and  derivative  of  a  single  sigmoid  with  unit  weights  are  shown  in  Fig.  2.  If  u  is  a 
fast  variable  and  v  is  a  slow  variable,  choosing  the  proper  weights  on  the  inputs  and 
threshold  can  produce  a  gain  schedule  that  is  approximately  linear  in  one  region  and  non¬ 
linear  (with  an  inflection  point)  in  another.  More  complex  surfaces  can  be  generated  by 
increasing  the  number  of  sigmoids.  If  u  and  v  are  both  fast  variables,  then  the  sigmoid 
can  represent  a  generalization  of  their  nonlinear  interaction.  In  that  regard,  the  FNN  may 
represent  a  nonlinear  control  function  directly,  or  it  may  represent  a  nonlinear  dynamic 
model  that  is  to  be  inverted  by  separate  control  logic. 
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a)  Sigmoid.  b)  x-Derivative  of  Sigmoid. 


Figure  2.  Example  of  Sigmoid  Output  with  Two  Inputs. 

For  comparison,  a  typical  radial  basis  function  produces  the  output  shown  in  Fig.  3. 
Whereas  the  sigmoid  has  a  preferred  input  axis  and  simple  curvature,  the  RBF  admits 
more  complex  curvature  of  the  output  surface,  and  its  effect  is  more  localized.  The  most 
efficient  nodal  activation  function  depends  on  the  general  shape  of  the  surface  to  be 
approximated,  as  well  as  on  the  importance  of  compact  support. 

The  cerebellar  model  articulation  controller  (CMAC)  is  an  alternate  network  formu¬ 
lation  with  somewhat  different  properties  but  similar  promise  for  application  in  control 
systems.  The  CMAC  performs  table  look-up  of  a  nonlinear  function  over  a  particular 
region  of  function  space.  CMAC  operation  can  be  split  into  two  mappings.  The  first 
maps  each  input  into  an  association  space  A.  The  mapping  generates  a  selector  vector  a 
from  overlapping  receptive  regions  for  the  input.  The  second  mapping,  R,  goes  from  the 
selector  vector  a  to  the  scalar  output  y  through  the  weight  vector  w,  which  is  derived  from 
training: 

y  =  (5 

Training  is  inherently  local,  as  the  extent  of  the  receptive  regions  is  fixed.  The  CMAC 
has  quantized  output,  producing  "staircased"  rather  than  continuous  output. 

The  FNN  and  CMAC  are  both  examples  of  instantaneous  networks,  that  is,  their  out¬ 
puts  are  essentially  instantaneous:  given  an  input,  the  speed  of  output  depends  only  on  the 
speed  of  the  computer.  Dynamic  networks  rely  on  stable  resonance  of  the  network  about 
an  equilibrium  condition  to  relate  a  fixed  set  of  initial  conditions  to  a  steady-state  output. 
Bidirectional  Associative  Memory  (BAM)  networks  are  nonlinear  dynamical  systems 
that  subsume  Hopfield  networks,  Adaptive-Resonance-Theory  (ART)  networks,  and 
Kohonen  networks.  Like  FNN,  they  use  binary  or  sigmoidal  neurons  and  store  knowl¬ 
edge  in  the  weights  that  connect  them;  however,  the  "neural  circuits"  take  time  to  stabi¬ 
lize  on  an  output.  While  dynamic  networks  may  operate  more  like  biological  neurons, 
which  have  a  refractory  period  between  differing  outputs,  computational  delay  degrades 
control  functions;  hence,  instantaneous  networks  are  preferred  in  many  identification  and 
control  applications. 
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a)  Radial  Basis  Function  (RBF).  b)  x-Derivative  of  RBF. 


Figure  3.  Example  of  Radial  Basis  Function  Output  with  Two  Inputs. 

1.1  Aerodynamic  Model  Identification 

Aerodynamic  model  identification  is  an  example  of  the  generic  multivariate  func¬ 
tion  approximation  problem  with  certain  constraints  on  the  training  space.  The  multi¬ 
variate  functions  -  in  this  case,  aerodynamic  coefficients  that  have  nonlinear  dependence 
on  several  variables  -  are  embedded  in  high-order  ordinary  differential  equations.  The 
goal  is  to  derive  the  aerodynamic  models  from  measurements  of  physically  realizable 
trajectories  in  the  state  and  control  space  in  two  steps:  generation  of  the  training  set  and 
training  of  the  network.  First,  an  extended  Kalman  filter  (EKF)  processes  simulated  or 
actual  measurements  to  minimize  the  effects  of  measurement  error,  to  account  for  likely 
disturbance  effects  on  the  dynamical  system,  and  to  estimate  the  forces  and  moments 
related  to  the  aerodynamic  coefficients.  The  state,  control,  forces,  and  moments  are  the 
training  set  for  a  feed-forward  neural  network  (FNN)  (also  called  a  "back-propagation 
network,"  although  different  training  algorithms  are  used  here).  Second,  the  FNN,  con¬ 
taining  a  single  hidden  layer  of  sigmoid  or  radial-basis-function  "neurons,"  is  trained  on 
this  set  using  a  separate  EKF,  a  genetic  algorithm  (GA),  or  a  combination  of  the  two. 

1.2  Cerebellar  Model  Articulation  Controller 

The  cerebellar  model  articulation  controller  (CMAC)  is  based  on  a  sequence  of 
memory  and  data  mappings  rather  than  on  interconnected  neurons.  The  CMAC  maps  an 
input  vector  to  an  output  vector  in  two  steps.  In  the  first  mapping,  the  input  is  discretized 
and  quantized  in  the  receptive  region;  in  the  second  mapping,  outputs  of  the  receptive 
region  form  the  conceptual  or  association  memory  that  produces  the  desired  output. 
There  also  may  be  a  hashing  region  that  compresses  memory  requirements  with  little 
degradation  in  performance.  For  the  application  described  in  Section  2.2,  the  CMAC 
operates  in  parallel  with  a  proportional-integral-derivative  (PID)  controller,  which  pro¬ 
vides  initial  stabilization  as  the  CMAC  learns  its  weights  through  error-gradient-based 
training.  The  CMAC  gradually  takes  over  from  the  PID  controller,  providing  improved 
nonlinear  control  in  the  process. 
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1.3  Classical/Neural  Synthesis  of  Nonlinear  Control  Systems 

Classical/neural  synthesis  of  control  systems  combines  the  most  effective  ele¬ 
ments  of  old  and  new  design  concepts  to  produce  better  control  systems.  There  is  con¬ 
siderable  precedent  for  applying  gain-scheduled  linear  controllers  to  nonlinear  systems, 
especially  those  that  can  be  approximated  as  linear-parameter-varying  systems;  however, 
a  means  for  transferring  the  insights  gained  from  these  linear  controllers  to  nonlinear 
controllers  remains  to  be  identified.  This  research  initiated  a  new  approach  for  designing 
nonlinear  control  systems  that  takes  advantage  of  prior  knowledge  and  experience  in 
designing  linear  controllers,  while  capitalizing  on  the  broader  capabilities  of  adaptive, 
nonlinear  control  theory  and  artificial  (or  computational)  neural  networks.  Central  to 
this  new  approach  is  the  recognition  that  the  gradients  of  a  nonlinear  control  law  must 
represent  the  gain  matrices  of  an  equivalent,  locally  linearized  controller.  Hence,  a  fam¬ 
ily  of  satisfactory  linear  controllers  specified  over  the  operating  envelope  of  the  system 
forms  a  suitable  starting  point  for  the  definition  of  a  global  nonlinear  controller.  The  ini¬ 
tial  specification  for  the  controller,  which  can  be  represented  by  neural  networks,  retains 
the  stability,  performance,  and  robustness  guarantees  of  the  linearized  model  for  small 
perturbations.  On-line  learning  improves  control  response  for  large,  coupled  motions, 
accounting  for  differences  between  actual  and  assumed  dynamic  models  and  for  nonlin¬ 
ear  effects  not  captured  in  the  linearization. 


2.  Summary  of  Important  results 

2.1  Aerodynamic  Model  Identification 

Model  identification  results  focus  on  the  speed  and  accuracy  of  FNN  training,  the 
effects  of  using  alternate  nodal  activation  functions,  the  ability  of  the  network  to  gener¬ 
alize  from  trajectory  training  data,  and  the  ability  of  a  pre-trained  network  to  learn  a  new, 
localized  feature  of  the  function.  Six  methods  of  training  a  sigmoidal  FNN  for  model 
identification  are  compared  in  [3].  (Related  research  on  the  use  of  a  GA  for  learning  the 
design  parameters  of  a  linear  compensator  is  documented  in  [10].)  The  task  is  to  estimate 
the  lift  coefficient  of  an  aircraft  as  a  function  of  up  to  three  variables.  Of  the  six  methods, 
(four  EKFs,  a  genetic  algorithm,  and  a  hybrid  GA-EKF),  the  most  successful  is  an 
extended  Kalman  filter  with  fictitious  process  noise.  It  had  been  anticipated  that  the 
hybrid  method,  which  combines  an  initial  global  search  by  GA  with  the  strong  conver¬ 
gence  properties  of  the  EKF,  would  prove  efficient,  and  it  was  four  to  nine  times  faster 
than  either  GA  or  EKF  alone.  However,  the  EKF  with  process  noise  proved  quickest  to 
converge,  an  additional  five  to  20  times  faster  than  the  GA-EKF. 

An  apparent  advantage  of  using  a  radial  basis  function  (RBF)  for  a  nodal  activa¬ 
tion  function  rather  than  a  sigmoid  is  that  its  domain  of  significant  effect  is  more  compact 
[4,8].  The  sigmoid  extends  from  zero  to  one  across  the  input  domain,  while  the  RBF 
produces  an  exponential  "bump"  at  its  center,  trailing  off  to  zero  elsewhere.  If  features  of 
the  training  function  change  locally,  it  seems  likely  that  an  RBF  network  would  learn 
these  features  more  quickly  than  a  sigmoid  network  and  with  less  disturbance  to  the 
trained  function  elsewhere.  Our  investigation  confirms  this  hypothesis.  The  RBF  net¬ 
works  typically  converge  to  the  new  feature  several  times  faster  than  the  sigmoid  net¬ 
works. 


Nevertheless,  RBF  networks,  as  typically  defined,  suffer  from  a  “curse  of  dimen¬ 
sionality,”  growing  exponentially  and  inefficiently  as  the  number  of  independent  vari¬ 
ables  increases.  We  have  defined  a  new,  hierarchical  type  of  RBF  network  that  is  espe- 
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cially  well-suited  to  on-line  learning  [9].  The  model  update  is  developed  over  time 
through  the  use  of  three  approximations,  all  housed  within  one  neural  network.  Gaussian 
radial  basis  functions  with  the  centers  placed  on  grids  of  different  resolutions  serve  as 
the  network  activation  functions,  or  nodes,  where  each  grid  is  assigned  a  set  of  width 
ranges.  Network  training  is  reduced  to  a  nodal  selection/output-weight  calculation  prob¬ 
lem.  The  network  considers  two  surfaces:  the  Baseline  and  Residual-Surface.  The  Base¬ 
line  portion  of  the  network  is  initialized  prior  to  flight.  Its  parameters  remain  fixed  dur¬ 
ing  system  operation  while  the  residual  surface,  that  surface  defined  by  the  difference 
between  actual  flight  data  and  the  current  Baseline  Approximation,  is  captured  by  the 
Residual-Surface  Approximation.  This  approximation  is  also  built  with  two  surfaces-  the 
Interim  and  Flight  Approximations.  The  Flight  model  is  generated  during  system  opera¬ 
tion  using  a  selection  procedure  based  on  nodal  output  magnitudes  and  the  Givens  least- 
squares  algorithm  for  output-weight  calculation.  The  Interim  Approximation  is  generated 
at  fixed  time  intervals,  and  it  efficiently  replaces  the  Flight  model.  The  information  held 
in  the  Interim  model  is  used  to  update  the  Baseline  Approximation.  The  Baseline  and 
Interim  surfaces  are  generated  with  a  new,  fast,  accurate  training  procedure  for  problems 
that  include  several  inputs.  Models  must  be  able  to  generalize  or  respond  accurately  to 
data  not  seen  during  training;  this  quality  is  monitored  during  all  phases  of  learning  and 
considered  for  approximation  establishment. 

2.2  Cerebellar  Model  Articulation  Controller 

Ground  vehicles  fueled  by  hydrocarbons  or  alcohols  and  powered  by  proton 
exchange  membrane  (PEM)  fuel  cells  address  world  air  quality  and  fuel  supply  concerns 
while  avoiding  hydrogen  infrastructure  and  on-board  storage  problems.  Instead  of  car¬ 
rying  gaseous  hydrogen  on  vehicles,  fuel  cell  developers  are  exploring  on-board  fuel 
processors  that  convert  a  hydrogen-containing  fuel  -  such  as  methanol,  ethanol,  or  gaso¬ 
line  -  into  a  hydrogen-rich  gas.  These  fuels  are  easier  to  store  and  distribute  than  hydro¬ 
gen  gas,  and  fuels  such  as  gasoline  have  a  production  and  distribution  infrastructure 
already  in  place.  However,  a  major  concern  when  operating  PEM  fuel  cells  on  the 
hydrogen-rich  gas  from  a  fuel  processor  is  the  poisoning  of  the  fuel  cell's  anode  catalyst, 
and  thus  the  degradation  of  vehicle  performance,  by  carbon  monoxide.  The  gas  stream  or 
“reformate”  from  a  fuel  processor  contains  hydrogen,  carbon  dioxide,  water,  and  carbon 
monoxide.  Care  must  be  taken  to  reduce  the  carbon  monoxide  level  in  the  gas  to  only  a 
few  parts  per  million  before  it  enters  the  fuel  cell.  In  an  on-board  fuel  processor,  the  final 
carbon  monoxide  clean-up  step  is  performed  by  a  relatively  new  catalytic  reactor  called 
the  preferential  oxidizer  or  PrOx.  Reduction  of  the  carbon  monoxide  concentration  in  the 
on-board  fuel  processor’s  hydrogen-rich  gas  by  the  preferential  oxidizer  (PrOx)  under 
dynamic  conditions  is  crucial  to  avoid  poisoning  of  the  PEM  fuel  cell’s  anode  catalyst 
and  thus  malfunction  of  the  fuel  cell  vehicle. 

The  CMAC  has  been  used  as  a  nonlinear  controller  for  a  fuel  cell's  PrOx  [5  6] 
The  gas  flow  rate  and  temperature  of  the  PrOx  have  nonlinear  effects  on  the  reaction  and 
must  be  carefully  controlled  to  maximize  the  performance  and  life  of  the  fuel  cell  - 
hence  the  need  for  an  adaptive,  nonlinear  controller.  The  CMAC  is  shown  to  perform 
better  than  a  PID  controller  alone,  given  slow  and  inaccurate  sensors,  rapid  fuel  processor 
transients,  and  systematic  fuel  processor  changes  due  to  aging. 

A  dynamic  control  scheme  for  a  single-stage,  tubular,  cooled  PrOx  was  been 
shown  to  perform  better  than  conventional  industrial  controllers.  The  hybrid  control 
system  contains  a  CMAC  artificial  neural  network  in  parallel  with  a  conventional  PID 
controller.  A  computer  simulation  of  the  preferential  oxidation  reactor  illustrated  the 
abilities  of  the  controller  and  compared  its  performance  to  the  performance  of  conven¬ 
tional  controllers.  Realistic  input  patterns  were  generated  for  the  PrOx  by  using  models 
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of  vehicle  power  demand  and  upstream  fuel  processor  components  to  convert  the  speed 
sequences  in  the  Federal  Urban  Driving  Schedule  (FUDS)  to  PrOx  inlet  temperatures, 
concentrations,  and  flow  rates.  The  hybrid  controller  generalizes  well  to  novel  driving 
sequences  after  being  trained  on  other  driving  sequences  with  similar  or  slower  tran¬ 
sients.  Although  it  is  similar  to  the  PID  in  terms  of  software  requirements  and  design 
effort,  the  hybrid  controller  performs  significantly  better  than  the  PLD  in  terms  of  hydro¬ 
gen  conversion  setpoint  regulation  and  PrOx  outlet  carbon  monoxide  reduction. 

2.3  Classical/Neural  Synthesis  of  Nonlinear  Control  Systems 

We  consider  dynamic  systems  described  by  the  nonlinear  ordinary  differential 
equation: 

x  =  f[x(t),  p(t),  u{t),  w(t )]  (6 

x  is  the  (n  x  1)  plant  state,  p  is  a  ( ^  xl)  vector  of  plant  and  observation  parameters,  u  is 

the  (m  x  1)  control,  and  w  is  a  (s  x  1)  vector  of  disturbance  effects.  The  equation  may 
represent  a  “lumped-parameter”  system,  or  it  may  be  an  approximation  to  an  unsteady 
partial  differential  equation.  Plant  motions,  controls,  and  disturbances  are  sensed  in  the  (r 

x  1)  output  ys, 

ysi  0  =  hs  [x(0,  pit),  uit),  w(0]  (7 

and  the  measurement,  z,  is  subject  to  uncertainty,  n: 

z(t)  =  ysit)  +  nit)  (8 

The  design  objective  is  to  specify  a  control  law  of  the  general  form 

uit)  =  c[zit),pit),ycit)]  (9 


that  has  two  properties:  it  achieves  mission  goals,  as  expressed  by  the  (rc  x  1)  command 
input,  yjt),  and  it  furnishes  adequate  stability  and  transient  response,  assuring  that  excur¬ 
sions  from  y^t)  caused  by  disturbance  or  measurement  error  are  acceptably  small  and  do 
not  require  excessive  control  use. 

The  command  input,  yjt),  can  be  viewed  as  some  desirable  (possibly  nonlinear) 
combination  of  state  and  control  elements,  and  its  dimension  is  less  than  or  equal  to  the 
number  of  independent  controls  (rc  <  m ): 

ycit)  =  hc[xit),uit)]  (10 

It  could  be  the  result  of  external  trajectory  planning  (e.g.,  following  a  prescribed,  possibly 
optimal,  path),  or  it  may  be  due  to  a  loosely  defined,  subjective  process  (e.g.,  the  expres¬ 
sion  of  a  human  operator’s  intent). 

For  the  discussion,  we  address  the  more  limited  goal  of  control  with  perfect  meas¬ 
urements,  simplifying  the  control  law  to 
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u{t)  =  c[x{t),p{t),yc{t)\ 


(11 


c[x(t),p(t),yc(t )]  may  be  a  functional,  containing  functions  of  its  arguments,  such  as 

integrals  or  derivatives  of  x(t)  and  yjt).  We  always  can  write  the  control  law  as  the  sum 
of  a  nominal  effect  and  a  perturbed  effect 


u(t)  =  c0[x0(t),p(t),yCg  (0]  +  Ac[x(t),p(t),yc(t)] 
=  ua(t)  +  Au(t) 


(12 


where,  for  simplicity,  we  assume  that  p(t)  is  known  without  error.  Anticipated  values  of 
the  state  and  command  are  x0  and  yc° ,  and  actual  values  are  x  and  ya  where 


x  =  x0  +  Ax 

y  =  y0  +  &y 


(13,14 


Hence,  the  control  law  can  be  expressed  as 

u(t)  =  c0  [x0  (0,  pit),  yCo  (f  )]  +  A  c[xa  ( t ),  pit),  yCg  ( t ),  A x{t),  Ayc(f)]  (15 

For  sufficiently  small  state  and  command  perturbations,  the  perturbed  control 
effect  is  linear  in  Ax  and  Ay0  and  it  can  be  written  as 

dc 

A u{t)  =  Ac[«]  =  ^\x0(t),p(t),ycit)]Ax  +  —^\x0(t),pit),yc(t)\Ayc 

dxl  J  dycL  1  (16 

=  Cx  Ax  +  CyAyc 


Cx  and  Cy  contain  the  m  gradients  of  the  control  law  evaluated  at  [x0(t),p(t),yc  (0]. 

Equation  16  can  be  viewed  as  a  linear,  gain-scheduled  control  law  which,  when  com¬ 
bined  with  c[*l,  provides  a  close  approximation  to  the  exact  nonlinear  controller  (eq.  1 1) 

for  small  Ax  and  Ay^ 

u(t)  =  c0  [x0  (t),  p(t),  yCg  it )]  +  CxAx  +  CyAyc  ( 1 7 


It  is  clear  that  knowledge  of  c[xit), p(t),ycitj\  at  a  single  point  and  of  Cx  and  Cy  over  the 
operating  range  of  [xit),pit),ycit)]  (or  some  suitably  dense  set  in  the  space)  is  equivalent 

to  knowledge  of  c[x(t),p(t),yc(t)]  over  the  same  range.  Put  another  way,  given  a  nonlin¬ 
ear  control  in  the  form  of  eq.  9,  the  corresponding  gain-scheduled  control  law  is  readily 
found.  Our  objective  is  to  find  efficient  ways  of  solving  the  inverse  problem,  that  is,  to 
derive  a  nonlinear  control  law  from  a  satisfactory  gain-scheduled  control  law. 
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Gain-scheduled  control  laws  are  based  on  a  set  of  linear,  time-invariant  (LTI) 
control  laws  specified  throughout  the  plant’s  operating  region.  Given  the  dynamic  sys¬ 
tem  or  eq.  6,  a  first-degree  expansion  can  be  written: 

x(t)  =  xo(t)  +  Ax(t) 

"  /[xMp«XuA‘),K  (»)]  +  ^-&x(t)  +  $-A u(t)  +  &-A  w(t) 

ox  du  dw 

=  /[•]  +  FAx(t)  +  GAu(t)  +  LAw(t) 

(18 

The  Jacobian  matrices,  F ,  G,  and  L,  are  evaluated  at  selected  operating  points  and  the 
perturbation  model  is: 


A*(t)  =  F[xo(t),p(t),u0(t),wo(t)]Ax(t ) 

+  G[x0(t),p(t),u0(t),w0(t)]Au{t)  +  L[xo(t\p(t),u0(t),w0(t)]Aw(t) 

(19 

This  model  is  almost  a  linear,  parameter-varying  (LPV)  plant,  “almost”  because 
the  system  matrices  depend  on  x0(t),  as  well  as  the  remaining  variables.  In  most  applica¬ 
tions,  effects  of  parameter  variation  are  ignored  because  time- varying  dynamic  effects  are 
small,  and  [F,  G,  L]  is  treated  as  a  set  of  LTI  plant  models.  Linear  control  gains  (e.g.,  C 
and  Cy)  are  computed  for  each  LTI  model,  and  the  control  law  is  implemented  with  inter¬ 
polation  of  gains  to  intermediate  operating  points.  In  past  applications,  the  number  of 
interpolating  variables  has  been  kept  small. 

Future  research  will  identify  a  means  of  greatly  expanding  the  number  of 
(independent)  interpolating  variables,  affording  an  improvement  in  comparison  to  gain- 
scheduled  controllers.  More  important,  it  will  provide  an  excellent  initialization  point  for 
the  neural-network  controller  [7].  Given  Cx,  Cy,  and  the  corresponding  equilibrium  points 
at  each  operating  point,  the  corresponding  nonlinear  control  law  (eq.  11)  will  be  gener- 


We  assume  that  the  LTI  control  laws  used  for  this  pre-training  phase  satisfy 
accepted  engineering  design  criteria,  based  on  design  principles  described  in  our  earlier 
work.  For  example,  we  have  shown  how  to  use  Monte  Carlo  evaluation  and  genetic  algo- 
nthrns  to  design  robust  linear,  quadratic-Gaussian  (LQG)  controllers  that  satisfy  classical 
design  criteria.  That  process  begins  with  conventional  stability  and  performance  specifi¬ 
cations  (e.g.,  negative  eigenvalues,  suitable  limits  on  rise  time,  settling  time,  and  control 
usage),  and  it  generates  desired  values  of  Cx9  Cy ,  and  the  equilibrium  points.  In  the  proc- 
ess,  the  corresponding  weighting  matrices  for  quadratic  cost  functions,  such  as, 

if  1  r 

/  =  ,J™  2]  L[^(T).«(^)^=  lim -J [xT(t)Qx(r)  +  2xT(T;)Mu(T)  +  uT(t)Ru(T)}lr  (20 
are  found.  These  cost  functions  become  critical  elements  for  on-line  learning. 
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On  line  training  of  a  neural  network  is  based  on  the  minimization  of  an  error 

function;  the  error  function  chosen  here  takes  the  form  of  eq.  15.  The  result  is  a  dynamic 

programming  problem,  m  which  the  nonlinear  control  law  minimizes  the  expected  value 

?/  a  k°Su  fUTr°niUCh  aS  eq'  20'  We  rePlace  the  cost  function,  J,  by  the  value  function 
V,  which  is  defined  as  ’ 

I  t 

V(J)  =  -  lim  —  |  L[x(  t),  u{  t)]c?t  (2 1 

by^hd^ofcontroJ6  ^  funCtion  via  the  Hamilton-Jacobi-Bellman  (HJB)  equation 


dV* . 


~[x  *  (0,f]  =  -min|L[x  *  (t),u(t),t]  + ~[x  *  (t),t]f[x  * (t),u(t),t\ 


(22 


Because  the  problem  is  stochastic,  the  value  function 
the  integral  in  eq.  16: 


is  recast  as  the  expected  value  of 


V{t)  =  -E 


|  t 

lim  —  J  L[x(  t),w(t)]g?t 

<r*°°  2 


(23 


Hence,  the  elements  of  eq.  17  are  expectations  rather  than  deterministic  functions.  With 
sufficient  smoothness,  the  corresponding  control,  w*(t),  along  the  optimal  trajectory, 
x*(T),  is  specified  by  the  optimality  condition 


(24 


d  r  r)V* 

0  =  Tu[L[x  * (t)Mt)^ + *  (t)Mt),t] 

While  this  condition  implicitly  specifies  the  control,  our  goal  is  to  derive  an 
explicit  relation  in  the  form  of  eq.  1 1.  Solution  of  this  problem  is  afforded  by  the  adap¬ 
tive  critic  architecture  This  architecture  consists  of  an  action  network,  which  expresses 
the  control  law  (eq.  11),  plus  a  critic  network,  which  estimates  the  dV*/dx  required  in  eq 
24.  for  the  linear  case,  optimal  control  perturbations  can  be  calculated  as 


*JT7  jJ; 

Ah(0  =  -ir'Gr  =  —R~lGTP(t)Ax(t)  =  CxAx(t) 


(25 


where  P(t)  is  the  solution  to  the  well-known  matrix  Riccati 
intimate  relationship  between  dV*/dx  and  the  gradient  of  the 


equation.  Hence,  there  is  an 
control  surface  Cx. 
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